Skip to content

Conversation

@norio-nomura
Copy link
Contributor

@norio-nomura norio-nomura commented Nov 13, 2025

Check the SSH server in a way that complies with the SSH protocol using x/crypto/ssh.
This change fixes #4334 by falling back to usernet port forwarder on failing SSH connections over VSOCK.

  • pkg/networks/usernet: Rename entry point from /extension/wait_port to /extension/wait-ssh-server
    Because it changed to an SSH server-specific entry point.
    When a client accesses the old entry point, it fails and continues with falling back to the usernet forwarder.

  • pkg/sshutil: Add WaitSSHReady()
    WaitSSHReady waits until the SSH server is ready to accept connections.
    The dialContext function is used to create a connection to the SSH server.
    The addr, user parameter is used for ssh.ClientConn creation.
    The timeoutSeconds parameter specifies the maximum number of seconds to wait.

  • pkg/sshutil: Use hostKeyCollector().checker() instead of ssh.InsecureIgnoreHostKey()
    hostKeyCollector().checker():
    checker returns a HostKeyCallback that either checks and collects the host key,
    or only checks the host key, depending on whether any host keys have been collected.
    It is expected to pass host key checks by retrying after the first collection.
    On second invocation, it will only check the host key.

@norio-nomura norio-nomura force-pushed the vz-ssh-over-vsock-try-handshake branch 3 times, most recently from 16b4782 to 8ca744e Compare November 13, 2025 09:07
@norio-nomura
Copy link
Contributor Author

norio-nomura commented Nov 13, 2025

Expecting CI failures will be fixed by #4336 and #4341
Fixed.

@norio-nomura
Copy link
Contributor Author

norio-nomura commented Nov 14, 2025

Following change added to avoid CodeQL error:

  • feat: Generate SSH server keys in host agent and use them in guest OS
    Dropped

@norio-nomura norio-nomura force-pushed the vz-ssh-over-vsock-try-handshake branch from 0250449 to e4209f1 Compare November 17, 2025 09:16
@norio-nomura
Copy link
Contributor Author

Following change added to avoid CodeQL error:

  • feat: Generate SSH server keys in host agent and use them in guest OS

Dropped this feature, because https://github.com/lima-vm/alpine-lima does not support ssh_keys field in user-data, and #4333 depends on this PR won't work with that.

@norio-nomura norio-nomura force-pushed the vz-ssh-over-vsock-try-handshake branch from e4209f1 to 332dbba Compare November 17, 2025 09:23
@norio-nomura
Copy link
Contributor Author

norio-nomura commented Nov 17, 2025

Added:

  • CodeQL: Exclude go/insecure-hostkeycallback
    Split ssh.InsecureIgnoreHostKey() usage into pkg/sshutil/sshutil_use_insecure_hostkeycallback.go and
    exclude go/insecure-hostkeycallback on that file.
    We already ignore host key verification when executing SSH commands in Lima,
    by using UserKnownHostsFile=/dev/null and StrictHostKeyChecking=no in the SSH command arguments.

    Dropped.

@norio-nomura norio-nomura force-pushed the vz-ssh-over-vsock-try-handshake branch 2 times, most recently from 1d80083 to 3ae0d29 Compare November 19, 2025 07:38
for {
conn, err := dialContext(ctx)
if err == nil {
sshConn, chans, reqs, err := ssh.NewClientConn(conn, addr, sshConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is quite hard to maintain the code that mixes up /usr/bin/ssh with golang.org/x/crypto/ssh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks overengineering.

Can we consider another way to fix the issue #4334 ?

e.g., by writing vsock availability to a serial port

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I really want to do is #4333, and this is like a grounding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that #4333 and this can also be implemented without using host key generation and known_hosts.
If the host key generation and the known_hosts area disappear, I think the implementation will be simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., by writing vsock availability to a serial port

I tried to reconsider this, but I think it's difficult.
On Fedora 43's case:

  • Like ubuntu, where normal SSH over VSOCK can be used, systemd listens to VSOCK:22.
[norio@lima-fedora ~]$ sudo ss -l --vsock -p
Netid State  Recv-Q Send-Q Local Address:Port Peer Address:Port Process                                     
v_str LISTEN 0      0                  *:22              *:*     users:(("systemd",pid=1,fd=228))           
v_str LISTEN 0      0                  3:2222            *:*     users:(("lima-guestagent",pid=1443,fd=13)) 
  • Accepted connection is denied by some mechanism(audit?).
    port availability detection log:

    [norio@lima-fedora ~]$ sudo journalctl|grep -E '[(1030|1037)]'
    Nov 20 08:32:47 lima-fedora audit[1030]: AVC avc: denied { getattr } for pid=1030 comm="sshd-session" scontext=system_u:system_r:sshd_session_t:s0-s0:c0.c1023 tcontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tclass=vsock_socket permissive=1
    Nov 20 08:32:47 lima-fedora sshd-session[1030]: banner exchange: Connection from UNKNOWN port 65535: Broken pipe

    Actual log on SSH connection via forwarded to VSOCK is denied:

    Nov 20 08:32:48 lima-fedora sshd-session[1037]: ssh_dispatch_run_fatal: Connection from UNKNOWN port 65535: Permission denied [preauth]
    Nov 20 08:32:48 lima-fedora audit[1037]: AUDIT1112 pid=1037 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_session_t:s0-s0:c0.c1023 msg='op=login acct="(unknown)" exe="/usr/libexec/openssh/sshd-session" hostname=? addr=UNKNOWN terminal=ssh res=failed'

To writing availability via serial:

  • Detecting listener on VSOCK:22 is not sufficient, because Fedora 43 is listening.
  • It requires detecting configuration of audit? or another refusing mechanism on other distros. And it must be detected whether they are set to interrupt the connection.

So, I think, it is more reliable to actually try the connection with the SSH protocol, and the portability is also high.
And using ssh installed on the host to test the connection to VSOCK will increase the same uncertainty as the execution of requirements that cause CI failure.

Copy link
Member

@AkihiroSuda AkihiroSuda Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And using ssh installed on the host to test the connection to VSOCK will increase the same uncertainty as the execution of requirements that cause CI failure.

Why can crypto/ssh avoid that uncertainty?
Can we just fix that uncertainty by executing /usr/bin/ssh with some option?

Copy link
Contributor Author

@norio-nomura norio-nomura Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because x/crpyto/ssh does not use inter-process communication.
I think stderr=\"\": read |0: bad file descriptor" that I'm trying to fix in #4333 is an error in pipe used for inter-process communication. I don't think it can be fixed with the option to pass to /usr/bin/ssh.
It might be fixed if the executing method of /usr/bin/ssh changes, but it seemed better to reduce external dependencies at the run time.
After trying #4333 and #4337 so far, I thought x/crypto/ssh would be reliable enough to execute requirements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be fixed if the executing method of /usr/bin/ssh changes, but it seemed better to reduce external dependencies at the run time.

limactl shell definitely depends on /usr/bin/ssh though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limactl shell definitely depends on /usr/bin/ssh though

It can also be implemented in x/crypto/ssh. There's no reason for us to do that.
All SSH communication doesn't have to depend on either /usr/bin/ssh or x/crypto/ssh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lima still has to generate ~/.lima/INST/ssh.config for interoperability with external programs, and has to verify that the generated config works, regardless to the limactl shell implementation uses it or not.

@norio-nomura
Copy link
Contributor Author

norio-nomura commented Nov 20, 2025

  • CodeQL: Exclude go/insecure-hostkeycallback

This has been dropped since GitHub Advanced Security does not affect that configuration.

Instead of them, added:

  • pkg/sshutil: Use hostKeyCollector().checker() instead of ssh.InsecureIgnoreHostKey()
    hostKeyCollector().checker():
    checker returns a HostKeyCallback that either checks and collects the host key,
    or only checks the host key, depending on whether any host keys have been collected.
    It is expected to pass host key checks by retrying after the first collection.
    On second invocation, it will only check the host key.

This implementation becomes much simpler than 3ae0d29 :

  • Dropped: generating host key and passing it to the guest
  • Dropped: generating, using, and updating ssh_known_hosts

@norio-nomura norio-nomura force-pushed the vz-ssh-over-vsock-try-handshake branch from 3cadc6b to d08220b Compare November 20, 2025 05:20
Check the SSH server in a way that complies with the SSH protocol using x/crypto/ssh.
This change fixes lima-vm#4334 by falling back to usernet port forwarder on failing SSH connections over VSOCK.

- pkg/networks/usernet: Rename entry point from `/extension/wait_port` to `/extension/wait-ssh-server`
  Because it changed to an SSH server-specific entry point.
  When a client accesses the old entry point, it fails and continues with falling back to the usernet forwarder.

- pkg/sshutil: Add `WaitSSHReady()`
  WaitSSHReady waits until the SSH server is ready to accept connections.
  The dialContext function is used to create a connection to the SSH server.
  The addr, user parameter is used for ssh.ClientConn creation.
  The timeoutSeconds parameter specifies the maximum number of seconds to wait.

Signed-off-by: Norio Nomura <norio.nomura@gmail.com>

# Conflicts:
#	go.mod

# Conflicts:
#	go.mod
…ureIgnoreHostKey()`

- `hostKeyCollector().checker()`:
checker returns a HostKeyCallback that either checks and collects the host key,
or only checks the host key, depending on whether any host keys have been collected.
It is expected to pass host key checks by retrying after the first collection.
On second invocation, it will only check the host key.

The code that uses `ssh.InsecureIgnoreHostKey()` in `x/crypto/ssh` is pointed out in CodeQL as `Use of insecure HostKeyCallback implementation (High)`, so it is an implementation to avoid this.

Signed-off-by: Norio Nomura <norio.nomura@gmail.com>
@norio-nomura norio-nomura force-pushed the vz-ssh-over-vsock-try-handshake branch from d08220b to fb899c1 Compare November 20, 2025 08:40
@norio-nomura
Copy link
Contributor Author

In this PR, it detects that SSH over VSOCK cannot be connected to the Fedora 43 VM as follows, and it will fall back via usernet.

INFO[0010] [hostagent] Failed to detect SSH server on vsock port, falling back to usernet forwarder error="failed to create ssh.Conn to "vsock:22": ssh: handshake failed: EOF"

@AkihiroSuda AkihiroSuda requested a review from a team November 21, 2025 01:48
@AkihiroSuda
Copy link
Member

AkihiroSuda commented Nov 21, 2025

We may consider this PR for v2.1+, but for v2.0.2 let's merge the PR below to avoid overengineering:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Template podman/fedora is no longer able to SSH

2 participants